Method 1 - Babbitt Score 



(1100) For a given variable, for at) 
observations, convert provided values 
into proxy values, iif necessary. 



± 

(1200) For a given variable, for ail 
observations, segregate proxy values 

associated with the variable Into 
categories (e.g.. Top 2 box. Top 3 box, 
etc.) 



(1300) For each category, determine 
values* distribution by dividing number 
of provided values in category by total 
number of provided values 



(1400) Calculate a top category score 
by adding the top category distribution 
{%) to the bottom category distribution 
(%) and subtracting an ideal 
distribution of neutrals (%) and 
multiplying an absolute value of the 
result by 100 



(1500) Calculate a difference score by 
subtracting the tsottom category 

distribution (%) from the top category 
distribution (%) and multiplying an 
absolute value of the result by 100 



(1600) Calculate an effectiveness 
(Babliitt) score for the variable by 
adding the top category score to the 
difference score 



(1700) Evaluate the variable based on 
the effectiveness score 



Method 2 



Bestfit Clustering 



(2100) For each observation, obtain dataset, 
each dataset having observation IDs, variables, 
possible values, provided values, and (where 
applicable) proxy values for provided values 



(2200) Specify a number of dusters, 
any desired variable weights, and a 
maximum number of iterations for the 
clustering solution 



J The # of clusters is an integer > 0. 
I The # of iterations is an integer > 0. 



(2300) Develop initial duster 
assignments: Select only 1 path within 
(2130) 



(2350) Read spedfied Initial duster 
assignments (already present in 
dataset. cf. Method 3) 



Throughout Method 2. variable 
weights (where applicable) are 
incoroporated within fitness score 
calculation (cf. Method 10) 



(2360) Systematic Search: Identify 
the pair of variables that creates a 
dustering solution that maximizes 
fitness score using the spedfied 
number of dusters 



(2361) Discover the 1 variable that 
creates a dustering solution that 
maximizes fitness score using the 
spedfied # of dusters 



(2362) Hold the variable from (2361) 
constant and discover the variable that 
in tandem creates a dustering solution 
that maximizes fitness score using the 
spedfied # of dusters 



(2370) Thorough Search: Identify any 
two variables that together create a 
dustering solution that maximizes 
fitness score using the spedfied 
number of dusters 



(2371) Discover the 1 variable that 
creates a dustering solution that 
maximizes fitness score using the 
spedfied # of dusters 



(2372) Hold variable from (2371) or 
(2373) constant and discover the 
variable that in tandem creates a 
dustering solution that maximizes 

fitness score using the spedfied # of 
dusters 



(2373) Hold variable from (2372) 
constant and find the variable that in 
tandem creates a dustering solution 
that maximizes fitness score 



(2400) Store observation IDs, corresponding 
duster IDs. fitness score 



I Repeat this process by 
! cyding through all 
I possible combinations 
--j of variable pairings until 
1 fitness score as 
! calculated in (2373) Is 
! maximized 



V 




Fig. 2a 



Jethod 2 - Besffit CDosteiroinig - ConltDniyedl 




FROM (2400) 



(2450) For each duster and variable, 
calculate the mode of provided values, 
and (where applicable) proxy values 
for provided values 



I From (2591) 



X 



(2460) Store cluster IDs, variable IDs. 
and corresponding modes from (2450) 



T 



(2470) Select a fraction of 
observations, divided evenly among all 
specified clusters from the dataset 



(2480) Randomly re-asstgn the 
observations from (2470) to different 
dusters 



(2490) For each duster and variable. 

calculate the mode of provided 
values, and (where applicable) proxy 
values for provided values 



(2500) Calculate fitness score for the 
dustering solution developed In (2480) 



It is preferrable that 
observations be chosen 
randomly, furthermore it is 
preferrable that the exact 
fraction never exceed 10% of 
the entire dataset's population 
of observations 



(2505) store fitness score, duster 
assignments, and observation IDs 



(2510) Select one random observation 
from a random duster and change its 
duster assignment 



(2520) Calculate fitness score of 
(2510) 



From (2550) 



(2530) Conditional function: Compare 
fitness score of (2520) with (2505) 



Only proceed If looping 
from (2550) 



(2531) If fitness score of (2520) 
< or = fitness score of (2505). 
cyde through all possible duster assignments 
for observation selected In (2510) until fitness 
is maximized 



(2536) If fitness 
score of (2520) 
> (2505), proceed 
to (2540) 



(2532) Conditional function: Compare 
maximum fitness score of (2531 ) with (2505) 



(2533) If fitness score of (2520) 
<or= (2505). retum selected 
observation to original duster 
assignment 



(2535) If fitness 
score of (2520) 
> (2505). proceed 
to (2540) 



(2534) Retum to 
(2510) 



(2540) Store 
Observation IDs 
and new duster 

assignments 



(2550) Replace 
(2505) with (2540) 
and retum to (2510) 



Hold the other observations in 
their current duster 
assignments from (2505) 



(2551) If fitness score of 
(2520) = (2505), 
proceed to (2560) 



(2552) If fitness score of 
(2520) < or > (2505). 
return to (2540) 



J 




Fog. 2b 
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Method 2 - Bestfit Clustering - Continued 




FROM (2551) 



To (2470) 



i 



(2560) End Iteration 



i Steps (2510) : 
I through (2560) \ 
I represent 1 Iteration | 



(2570) Store Iteration ID. 
Observation IDs, {X)nBsponding 
duster assignments 



Iteration ID is a positive integer <^ the 
total # of specified iterations. Its value 

increases serially in the fonn 
jc + 1 where x = previous iteration ID. 



(2580) Conditional function: 
compare Iteration ID from 
(2570) with total # of Iterations 
specified in (2200) 



(2581) If Iteration 
ID from (2580) < 
total # of iterations 
specified in (2200). 
return to (2470) 



(2582) If Iteration 
ID from (2580) = 
total # of iterations 
specified in (2200), 
proceed to (2600) 



(2600) Store Iteration ID that produced maximum 
fitness score, maximum fitness score, observation 
IDs. and corresponding duster assignments that 
produced maximum fitness score 



(2610) Dump data from (2600) 
into a file of usable format (e.g. 
ASCII, .CSV, .txt, .pm) 



(2620) Terminate 



Method 3 - Champion/Challenger 
Clustering Refinement 



(3010) For each observation, obtain dataset, 
dataset having variables, possible values, 
provided values, and (where applicable) 
proxy values for provided values 


^ 


r 


(3020) Append initial cluster assignments 
to dataset (so they correspond to 
Observation IDs) 






(3030) Specify maximum number of 
iterations 




f 


(3040) Execute (2450) to (2610) of 
Method 2 




r 


(3050) Terminate 



IVietlhiodl 4 = ComposotDoini Aoaiysos 



(4010) For each observation, obtain dataset 
having a cluster assignment for the 
observation and having a proxy value for 
each of the variables in the dataset. each 
variable having possible values 



(4020) For each observation, estimate a 
purposeful probability (a measure of a 
proability that an observation in a particular 
cluster provides an answer to a question In 
a non-random manner) that a particular 
possible value for a particular variable will 
be provided by observations assigned to a 
particular cluster 



(4021) Create probability variables 
for each cluster, variable, answer 
combinatio ns as p 



(4022) Estimate the probability that answer 
valued is given by the observations in 
duster m for a variable * that has 
possible answers 



(4023) Define 6 within a constraint that 
allows for usable output 



(4025) For each observation, 
store and/or output the 
purposeful prot>ability 



T 



(4024) Computational process is executed 
across alt 

P„(it,;) = Mi£)(i_5oL»)+^ 

Nm 



(4030) For each observation and 
each possible value, calculate a 
serendipity probability (a measure 
of a probability that an observation 
in a particular cluster will be 
associated with any of the 
possible values for a particular 
variable) 



If a observation i in duster m 
selected responses 
"ranitomly.* then the 
probabilities of selecting his 
responses should be 
described as 
J_ 

L 



(4050) For each observation, assume that 
before observation made, observation has 
an equal probability of being in any duster 



(4055) For each observation, assume that 
the purposeful probabilities are true 



(4060) For each observation calculate a Bayes 
probability that a particular obsen^tion can be 
in each duster conditional upon the 
observation's proxy value 



(4065) For each 
observation, store and/or 
output the Bayes probability 



(4035) For each observation, 
calculate a ratio of purposeful 
probability to serendipity probability 



(4040) For each observation, calculate a log of 
the ratio to obtain a composition analysis score 



(4045) For each observation, store and/or 
output the composition analysis score 



If ^ = 0, the resulting 
over-predsion of the 

calculated probabilities 
compromises 

computational effidency 



*5 = min |o.02, j^j 

is a value that produces 
meaningful results 



Let A^^ = total # of observations in 
duster m; N^(k. = the # of 
observations in duster m who give the 
£-th answer value to variable *; and 



S ~ min 



jo.02,— 
1 2L 



If observation i in duster m 
•purposefully" and logically" 
selected his responses, then the 

probabiliUes of selecting his 
responses should be described by 



I ■-■ Calculate 

1 nr..-p-it.z») 

I NB: n means calculated for each 

^A...../y where t = f: 
! L retains its 

t customary function 



(4080) For each observation, calculate a 
percent of proxy values for the variables 
that equals a mode of that observation's 
duster's proxy values for the 
corresponding variables 



(4085) For each 
observation, output 
the calculated 
percent 



(4090) Dassify each observation based on results obtained in 
activity (4045). (4065). and/or (4085) 



Fdo. 4 
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(5100) Obtain dataset for observations, 
dataset having variables, possible 

values, provided values, and 
corresponding cluster assignments 




7 


(5200) Conditional fi 
cluster weights ar 
coverage or effic 


jnction: Determine if 
e needed to meet 
aency objectives 



For this flow chart, let the total set ; 
of variables be defined as {X} and j 
the desired maximum # of | 
variables to be used as k.. \ 

. J 



Underweighting is used to Implement 
efficiency, while overweighting is used to 
Implement coverage 



(5210) If no weights are 
needed, proceed to (5300) 



(5220) Conditional function: 
Assign each cluster a weight 
using the decision mles in 
(5221), (5222), (5223) 



Ail dusters must be 
assigned weights, 
regardless of magnitude 



(5221) If cluster is to be 
unweighted, set weight 

as W=: 1 



(5222) If cluster is to be 
overweighted, set weight 
as w > 1 



(5223) If cluster is to be 
underweighted, set 
weight as 0 < w < 1 



(5300) Begin approximating the clustering solution 
developed using Method 2 by developing clustering 
solutions that employ only 1 variable 
from (K) in each. 



j In other words, if there are K variables, then K 
optimized solutions are created, 1 per in {K} 



(5310) Create a "dummy" variable for each 
cluster so that if there are M clusters, then 
M variables 



Let designate a 
"dummy" variable for 
observation i in duster m 



(5320) Conditional function: populate M variables 
per observation per dustering solution 



Let f be a member of the set of 
observations W. * be a member of 
the set of variables (K), and m be a 
member of a set of dusters {M} 



(5321) If observation i is in 
duster m, then r = 1 



(5322) If observation / is 
not in duster m, then K = 0 



(5330) Store all values for M 
variables for all observations 
for each dustering solution 
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So the total number of variables ! 
created per observation per \- 
duster is ^ 



(5340) Create a "dummy" variable for every 
variable-possible value combination so that 
if there are possible answers to variable 
k, then "dummy" variables are created 
for each observation i for each clustering 
solution 



(5350) Conditional function: for observation / 

and variable k, populate L^ variables per 
clustering solution 



(5351) If observation i 
gives the ^ -th answer for 
variable k, then X.gj^= 1 



(5352) If observation i 
does not give the / -th 
answer for variable it. 



Let Xiik designate a 
"dummy" variable for 
observation i who can 
answer possible value £ or 
variable k 



(5360) Store all values for K(l^) variables 
for all observations per clustering solution 



3 ; ; 



Regression occurs on: 



(5370) For each cluster in {M}, use ordinary 

least squares to regress all for all 
observations in {R} per clustering solution 



\l^,\x,„:k^K^<e<Lk-1^ 



SO that we can generate a linear approximation to the 
probability that an observation with a particular set of 
answers to the variables in fK} ts in a particular duster 
within fAfJ 



Index is created by 
multiplying the linear 
approximation to the 

probability of an 
observation's duster 
assignment by that 
dusters spedfied weight 



(5380) Conditional function: construct a "simpler* 
dustering solution to the one generated using 
Method 2 (Bestfit dustering) 



By "simpler", what is meant is an 
approximation of the "actual" 
dustering solution using the 
spedfied constraints. In this case 
using only 1 variable within {KJ 



(5381) If weights were spedfied 
in (5200). create an index for 
each observation's duster 
assodation 



(5382) If weights were not spedfied in (5200), assign 
each observation in {R} to the m-th duster in fM} that 
gives the highest value of the linear approximation to 
the probabilities of being in any of the members of {MJ 
as calculated In (5370) for each dustering solution 



(5383) Assign each observation in (RJ to the 
m-th duster in (M) that gives the highest 
index value for each dustering solution 



(5390) Store the outputs of regression (i.e. coefftdents and 
constants), variable IDs, observation IDs, "approximated" duster 

assignments, and actual duster assignments for all dustering 
solutions 



To Fig. 5c 



iethod 5 = SegmeinitaftD©o=oini4he=Fly = CoinltiioTiyedl 




FROM (5390) 





Accuracy score = 
[# of observations whose 
approximated and actual 
cluster assignments are 
identical) divided by [the total 
# of observations In {R} ) 


(5400) Calculate accuracy 
scores for the results 
stored in (5390) for all 
clustering solutions 









(5410) Select the stored 
solution from (5390) that 
maximizes accuracy score 



2 

(5411) Store the outputs of regression (i.e. coefficients and constants), 
variable IDs. observation IDs. "approximated" duster assignments, and 
actual cluster assignments for the solution that was selected in (5410) 



2 

(5420) Begin approximating clustering 
solutions using only 2 variables in each 



2 

(5421) Hold variable k from 
(541 1 ) constant and now 
execute (5310) through (5400) 
for a!) possible pairs of k from 
(5411) and the 
{k■^^ yth variable 



2 

(5422) Refine vanning 2 variable solution 
from (5421) 



The pattern in 
(5423) increases 
serially as the 
number of variables 
used to approximate 
the clustering 
solution from 
Method 2 increase 
serially 



(5423) Hold the 
(* + Ayth variable from (5421) constant and now 
execute (5310) through (541 1) for all possible pairs 

of the {k + ^)-th variable from (5421) and the 
remaining variables In {K}, excluding the {k + Ayth 
variable and variable k identified in (5411) 



(5430) Continue looping (5420) through (5423) by sequentially 
increasing the number of variables used in (5420) at the t>eginning of 
each loop until a maximized solution (in terms of accuracy score) is 
identified for a simpler clustering solution that uses k ^ variables to 
approximate the clustering solution identified in Method 2 



(5440) Select and store the outputs of regression, variable IDs. "approximated" duster 

assignments (and con-esponding observation IDs), actual duster assignments (and 
corresponding observation IDs), and accuracy scores for only the maximized solutions 
for at) solutions aeated up through and induding k variables 



Therefore if the ^ + 2)-tk 
variable is added to the pair 
of the k'th and (k + l)-th 
variables to create a 
dustering solution that best 
approximates the objective 
function (i.e. original 
dustering structure 
developed in Method 2), then 
in the refining process 

(5422). the 
(k + 2)'th variable is held 
ccmstant while the k-tk and (k 

^ i)-th variables are 
replaced with all remaining 
variables to test the triplet of 

variables that best 
approximates the results of 
Method 2 



(5450) Dump (5440) into a file with usable data format 
(e.g. ASCII, .bet. .csv. .pm) 



(5460) Terminate 



Fig. 



Method 6 - Behavioral Segment Scoring 



(6100) For each observation, obtain 
dataset. each dataset having variables, 
possible values, provided values, and 
con'esponding cluster assignments 
(developed in Method 2) 







(6200) Transform ail continuous variables 
into categorical or scalar forms 






(6300) Conditional function: Refine 
dataset to fadlttate analysis 



Dataset can consist of 
any combination of 

scalar, categorical, and 
continuous variables 



Analyze distribution tx)undaries within a ~ 
series of ranges to find the boundaries that 
create as normal distributions as possible; 

linear optimization Is the most efficient 
JJ^J^^S^jox executing this 



1 


r 






(6310) If dataset 

has <= 100 
variables, proceed 
to (6400) 




(6320) If dataset 

has > 100 
variables, reduce 
dataset as much 
as possible 



(6321) Execute the following 
analyses: 

- log scores 

- tree analysis 

- regression 

- discriminant analysis 










(6322) Remove variables 
identified in any 3 of 4 of the 
techniques in (6321) as "non- 

contributors"rinsignificanr 







These are standard 
statistical techniques that 
can be done in a 
mathematical 
programming language 

like Fortran or stat 
software package like 
SPSS or SAS 



Although <= 100 variables 

is ideal (in terms of 
computational efficiency), 
do not force an arbitrary 
"cut-ofT to ensure < =100 
variables are used in 
the dataset 




Fig. 6a 



Method 6 - Behavioral Segment Scoring - Continued 



FROM (6310) or (6322) 



(6400) Conditionat function: Specify 
maximum # of behavioral variables to be 
used in solution set depending on 
computational and/or time constraints 



(6410) If there is a 
computational or time 
constraint, select a 
maximum # of behavioral 
variables to be used < 
total # of behavioral 
variables in the dataset 



(6420) If there is no 
computational or time 
constraint, make 
maximum # of behavioral 
variables to be used - 
total # of behavioral 
variables in the dataset 







(6500) Conditional fi 
cluster weights ar 
martceting coven 
efficiency 


jnction: Determine if 
e needed to meet 
age or mariceting 
objectives 



(6510) If no weights are 
needed, proceed to (6600) 



(6520) Conditional function: 
Assign each duster a weight 
using the decision rules in 
(6520), (6522), (6523) 



(6521) If cluster is to be 
unweighted, set weight 
as w = 1 



All dusters must be 
assigned weights, 
regardless of magnitude 

I ' 



(6522) If duster is to be 
overweighted, set weight 
as w > 1 



(6523) If duster is to be 
underweighted, set 
weight as 0 < w < 1 



(6600) Execute steps 

(5300) to 
(5450) (of Method 5) 



(6700) Temiinate 



Fig. 6b 
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m 



Analyze distribution 
; boundaries within a series of I 
ranges to find the 
boundaries that create as 

normal distributions as 
possible; linear optimization 
is the most efficient method i 
.?o[e?ecutin 



(7100) Conditional function: Ascertain 
use of panel data 



(7110) If use panel data as 
objective measures of 
t>ehaviors to input into 
clustering executed using 
Method 2. follow this path 



(7111) Refine data 
collection instrument (e.g. 
survey) using Invention I 

(i.e. Babbitt Score) 



(7112) Field data collection 
Instrument within panel (e.g. 
IRI, Nielsen, Simmons, 
comSCORE, MR!) 



This is needed to ensure 
that the clustering solution 
developed using this 

dataset Is truly 
representative of a 
maricet/industry and Just 

the function of an 
Idiosyncratic group of the 
overall population 








(7113) Extract observations 
from collected data to 
assemble a dataset that 
reflects a category's/ 
industry's undertying 
demographics 




1 



(7114) Obtain dataset for 
extracted observations, 
dataset having variables, 
possible values, and 
provided values 



(7115) Append panel-based 
behavioral variables to each 
corresponding observation 
in dataset 



(7116) Transform any panel 
variables that are 

continuous into categorical 
or scalar variables 



(7117) Input data set from 
(7116) into (2420) of BestFit 
Dustering and proceed with 
clustering 



(7120) If use panel data for 
post-clustering analyses 
(e.g. tracking, promotion, 

positioning) follow this path 



These 2 paths are not 
exclusive, but cannot be 
executed simultaneously. 
However, path (7110) can 

feed into path (7121). 



(7121) Conditional function: 
Data collection 



(7121.1) If dataset 
developed using 
path (7110), proceed 
to{7122.1) 



(71 21 .2) Refine data 
collection Instrument (e.g. 
survey) using Babbitt Score 



(7122) Execute 
Methods 2 
through 4 



(7123) Execute 
Methods 2 
through 5 



(7123.1) Use typing tool developed in (7123) 
to duster score a representative sample of the 
panel's members using an expedient contact 
channel (e.g. outbound telephone, e-mail/ 
electronic surveys, mail-based surveys 



(7124) Execute duster-level 
analyses using panel data 



(7125) Terminate 



(7118) Temiinate 



Fog= 7 



Method 8 - Overall Segment-Based Marketing Process 



(8100) Develop and Field 
Pilot Survey 



(8200) Refine Survey 



Babbitt Score 
Bestfit Clustering 



(8300) Field Full 
Enumeration Sun^ey 



Append panel variables and/or other 
respondent-lvel behaviours (if applicable) 



If applicable 



(8850) Panel Analysis 



(8900) Develop Segment 
Insights 



(8950) Develop and Execute 
Marketing Ideas 



(8400) Clean Data 



(8500) Create Clusters 



(8600) Refine Clusters 



(8800) Append Panel 
Variables 



Composition Analysis 



Babbitt Score 

Champion/Challenger Cluster Refinement 
Panel Analysis 



Composition Analysis 



(8700) Segment-on-the-Fly 



(8750) Behavioral Segment 
Scoring 



Fig. 8 
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1 — ► 


(9500) I/O 








(9400) Instructions 


— ► 


(9300) Memory 






— ► 


(9200) Processor 






— ► 


(9100) Network 
Interface 



3e1tlh]©d 10 = Fittmiess Score Ca!lcylla1to©(n] 



(10010) Calculate modes of given values for all 
variables in {K} for cluster n 



(10020) Store modes, corresponding 
variable IDs, con-esponding duster ID 



Where ne{N}e{l} 

N consists of a finite # of clusters > 0. 
/ = donnatn of integers 



(10030) Calculate modes of given values for all variables 
in {K} for cluster 



(10040) Store modes, con-esponding 
variable IDs. corresponding cluster ID 



(10050) Conditional function: Assess 
the number of clusters for which 
modes have been calculated 



(10060) If the number of 
clusters for which modes 
have been calculated = N, 
proceed to (10080) 



(10070) If the number of 
dusters for which modes 
have been calculated < A^, 
retum to (10030) 



Where ks{K}B{l} 

K consists of a finite # of 

variables >0. 
/ = domain of integers 



Where ,e{/?}e{/} 



(10080) For each duster, compare the 
value provided by each constituent 
member for variat>le to the duster's 
mode for variable k 



(10090) Conditional function: compare 
value of I* for k to the mode of Jfe. 



J consists of the set of observations, I 

whose total number of 
i constituents is > 0. ! 
i /=.do?T\alD.ofJntegi^^ ! 

"1 Where i„ is a memt>er of duster n \ 



(10100) If equal, set 



(10110) If not equal. 
setK =0 



T 



(10120) Store score V^^ 



k^ is the variable k as answered by 
duster n 



j Where = score for observation i \ 
-j who is in duster n and has provided I 
answer value K for variable Jt I 



(10130) Conditional function: adjust 
^kni indicated weight 



(10140) If weight was 
specfied. multiply V^^ 
by corresponding 
yyeight for k 



(10150) If weight was 
not specfied, multiply 
^'^byl 



(10160) Store V^^fTom (10140) or 
(10150) 



(10170) Repeat (10080) to (10160) until a 
score V is calculated for all observations in 
their respective dusters for all variables k 



(10200) Sum all scores y for alt 
observations across all variables 



(10300) Store fitness score 



(10400) Terminate 



"-I This ts the fitness score ! 
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